NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

hu.MAP3.0: Atlas of human protein complexes by integration of > 25,000 proteomic experiments

https://doi.org/10.1101/2024.10.11.617930

Fischer, Samantha N; Claussen, Erin R; Kourtis, Savvas; Sdelci, Sara; Orchard, Sandra; Hermjakob, Henning; Kustatscher, Georg; Drew, Kevin (October 2024, bioRxiv)

Abstract Macromolecular protein complexes carry out most functions in the cell including essential functions required for cell survival. Unfortunately, we lack the subunit composition for all human protein complexes. To address this gap we integrated >25,000 mass spectrometry experiments using a machine learning approach to identify > 15,000 human protein complexes. We show our map of protein complexes is highly accurate and more comprehensive than previous maps, placing ∼75% of human proteins into their physical contexts. We globally characterize our complexes using protein co-variation data (ProteomeHD.2) and identify co-varying complexes suggesting common functional associations. Our map also generates testable functional hypotheses for 472 uncharacterized proteins which we support using AlphaFold modeling. Additionally, we use AlphaFold modeling to identify 511 mutually exclusive protein pairs in hu.MAP3.0 complexes suggesting complexes serve different functional roles depending on their subunit composition. We identify expression as the primary way cells and organisms relieve the conflict of mutually exclusive subunits. Finally, we import our complexes to EMBL-EBI’s Complex Portal (https://www.ebi.ac.uk/complexportal/home) as well as provide complexes through our hu.MAP3.0 web interface (https://humap3.proteincomplexes.org/). We expect our resource to be highly impactful to the broader research community.
more » « less
Full Text Available
Complex portal 2025: predicted human complexes and enhanced visualisation tools for the comparison of orthologous and paralogous complexes

https://doi.org/10.1093/nar/gkae1085

Balu, Sucharitha; Huget, Susie; Medina Reyes, Juan_Jose; Ragueneau, Eliot; Panneerselvam, Kalpana; Fischer, Samantha_N; Claussen, Erin_R; Kourtis, Savvas; Combe, Colin W.; Meldal, Birgit_H_M; et al (November 2024, Nucleic Acids Research)

Abstract The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated reference database for molecular complexes. It is a unifying web resource linking aggregated data on composition, topology and the function of macromolecular complexes from 28 species. In addition to significantly extending the number of manually curated complexes, we have massively extended the coverage of the human complexome through the incorporation of high confidence assemblies predicted by machine-learning algorithms trained on large-scale experimental data. The current content of the portal comprising 2150 human complexes has been augmented by 14 964 machine-learning (ML) predicted complexes from hu.MAP3.0. We have refactored the website to enable easy search and filtering of these different classes of protein complexes and have implemented the Complex Navigator, a visualisation tool to facilitate comparison of related complexes in the context of orthology or paralogy. We have embedded the Rhea reaction visualisation tool into the website to enable users to view the catalytic activity of enzyme complexes.
more » « less
UniProt: the Universal Protein Knowledgebase in 2025

https://doi.org/10.1093/nar/gkae1010

The_UniProt_Consortium; Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Adesina, Aduragbemi; Ahmad, Shadab; Bowler-Barnett, Emily_H; Bye-A-Jee, Hema; Carpentier, David; et al (November 2024, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase (UniProtKB; https://www.uniprot.org/) is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication, we describe ongoing changes to our production pipeline to limit the sequences available in UniProtKB to high-quality, non-redundant reference proteomes. We continue to manually curate the scientific literature to add the latest functional data and use machine learning techniques. We also encourage community curation to ensure key publications are not missed. We provide an update on the automatic annotation methods used by UniProtKB to predict information for unreviewed entries describing unstudied proteins. Finally, updates to the UniProt website are described, including a new tab linking protein to genomic information. In recognition of its value to the scientific community, the UniProt database has been awarded Global Core Biodata Resource status.
more » « less
Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0

https://doi.org/10.1021/acs.jproteome.9b00542

Deutsch, Eric W.; Lane, Lydie; Overall, Christopher M.; Bandeira, Nuno; Baker, Mark S.; Pineau, Charles; Moritz, Robert L.; Corrales, Fernando; Orchard, Sandra; Van Eyk, Jennifer E.; et al (October 2019, Journal of Proteome Research)

Full Text Available
UniProt: the Universal Protein Knowledgebase in 2023

https://doi.org/10.1093/nar/gkac1052

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bye-A-Jee, Hema; Cukura, Austra; et al (November 2022, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users’ experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
more » « less
Full Text Available
The Gene Ontology knowledgebase in 2023

https://doi.org/10.1093/genetics/iyad031

Aleksander, Suzi A; Balhoff, James; Carbon, Seth; Cherry, J Michael; Drabkin, Harold J; Ebert, Dustin; Feuermann, Marc; Gaudet, Pascale; Harris, Nomi L; Hill, David P; et al (March 2023, GENETICS)

Abstract The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
more » « less
Full Text Available
The Gene Ontology resource: enriching a GOld mine

https://doi.org/10.1093/nar/gkaa1113

Carbon, Seth; Douglass, Eric; Good, Benjamin M; Unni, Deepak R; Harris, Nomi L; Mungall, Christopher J; Basu, Siddartha; Chisholm, Rex L; Dodson, Robert J; Hartline, Eric; et al (December 2020, Nucleic Acids Research)
null (Ed.)
Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
more » « less
Full Text Available
UniProt: the universal protein knowledgebase in 2021

https://doi.org/10.1093/nar/gkaa1100

Bateman, Alex; Martin, Maria-Jesus; Orchard, Sandra; Magrane, Michele; Agivetova, Rahat; Ahmad, Shadab; Alpi, Emanuele; Bowler-Barnett, Emily H; Britto, Ramona; Bursteinas, Borisas; et al (November 2020, Nucleic Acids Research)

Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
more » « less
Full Text Available

Search for: All records